Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
1.
BMC Public Health ; 23(1): 998, 2023 05 30.
Article in English | MEDLINE | ID: covidwho-20234132

ABSTRACT

BACKGROUND: The current study examines the negative impact of the coronavirus disease 2019 (COVID-19) emergency declarations on physical activity among the community-dwelling older adults, the participants of a physical activity measurement program, in Japan. METHODS: This retrospective observational study included 1,773 community-dwelling older adults (aged 74.6 ± 6.3 years, 53.9% women) who had participated in the physical activity measurement project from February 2020 to July 2021. We measured physical activity using a tri-axial accelerometer during 547 consecutive days. Three emergency declarations, requesting people to avoid going outside, occurred during the observational period. We multiply-imputed missing values for daily physical activity, such as steps, light physical activity (LPA), and moderate-to-vigorous physical activity (MVPA) for several patterns of datasets according to the maximum missing rates on a person level. We mainly report the results based on less than 50% of the maximum missing rate (n = 1,056). Other results are reported in the supplemental file. Changes in physical activity before and after the start of each emergency declaration were examined by the regression discontinuity design (RDD) within 14-, 28-, and 56-day bandwidths. RESULTS: For all the participants in the multiply-imputed data with the 14-day bandwidth, steps (coefficients [[Formula: see text]][Formula: see text] 964.3 steps), LPA ([Formula: see text] 5.5 min), and MVPA ([Formula: see text] 4.9 min) increased after the first emergency declaration. However, the effects were attenuated as the RDD bandwidths were widened. No consistent negative impact was observed after the second and third declarations. After the second declaration, steps ([Formula: see text]-609.7 steps), LPA ([Formula: see text]-4.6 min), and MVPA ([Formula: see text]-2.8 min) decreased with the 14-day bandwidth. On the other hand, steps ([Formula: see text] 143.8 steps) and MVPA ([Formula: see text] 1.3 min) increased with the 56-day bandwidth. For the third declaration, LPA consistently decreased with all the bandwidths ([Formula: see text]-2.1, -3.0, -0.8 min for the 14, 28, 56-day bandwidth), whereas steps ([Formula: see text]-529 steps) and MVPA ([Formula: see text]-2.6 min) decreased only with the 28-day bandwidth. CONCLUSIONS: For the community-dwelling older adults who regularly self-monitor their physical activity, the current study concludes that there is no evidence of consistently negative impacts of the emergency declarations by the COVID-19 pandemic.


Subject(s)
COVID-19 , Independent Living , Humans , Female , Aged , Male , Pandemics , Exercise , Retrospective Studies
3.
Spat Stat ; 54: 100730, 2023 Apr.
Article in English | MEDLINE | ID: covidwho-2249147

ABSTRACT

Survival models which incorporate frailties are common in time-to-event data collected over distinct spatial regions. While incomplete data are unavoidable and a common complication in statistical analysis of spatial survival research, most researchers still ignore the missing data problem. In this paper, we propose a geostatistical modeling approach for incomplete spatially correlated survival data. We achieve this by exploring missingness in outcome, covariates, and spatial locations. In the process, we analyze incomplete spatially-referenced survival data using a Weibull model for the baseline hazard function and correlated log-Gaussian frailties to model spatial correlation. We illustrate the proposed method with simulated data and an application to geo-referenced COVID-19 data from Ghana. There are several disagreements between parameter estimates and credible intervals widths obtained using our proposed approach and complete case analysis. Based on these findings, we argue that our approach provides more reliable parameter estimates and has higher predictive accuracy.

4.
Int J Environ Res Public Health ; 20(3)2023 01 20.
Article in English | MEDLINE | ID: covidwho-2242618

ABSTRACT

The emergence of hyper-transmissible SARS-CoV-2 variants that rapidly became prevalent throughout the world in 2022 made it clear that extensive vaccination campaigns cannot represent the sole measure to stop COVID-19. However, the effectiveness of control and mitigation strategies, such as the closure of non-essential businesses and services, is debated. To assess the individual behaviours mostly associated with SARS-CoV-2 infection, a questionnaire-based case-control study was carried out in Tuscany, Central Italy, from May to October 2021. At the testing sites, individuals were invited to answer an online questionnaire after being notified regarding the test result. The questionnaire collected information about test result, general characteristics of the respondents, and behaviours and places attended in the week prior to the test/symptoms onset. We analysed 440 questionnaires. Behavioural differences between positive and negative subjects were assessed through logistic regression models, adjusting for a fixed set of confounders. A ridge regression model was also specified. Attending nightclubs, open-air bars or restaurants and crowded clubs, outdoor sporting events, crowded public transportation, and working in healthcare were associated with an increased infection risk. A negative association with infection, besides face mask use, was observed for attending open-air shows and sporting events in indoor spaces, visiting and hosting friends, attending courses in indoor spaces, performing sport activities (both indoor and outdoor), attending private parties, religious ceremonies, libraries, and indoor restaurants. These results might suggest that during the study period people maintained a particularly responsible and prudent approach when engaging in everyday activities to avoid spreading the virus.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , COVID-19/epidemiology , Pandemics/prevention & control , Case-Control Studies , Italy/epidemiology
5.
J Biomed Inform ; 139: 104295, 2023 03.
Article in English | MEDLINE | ID: covidwho-2210676

ABSTRACT

Healthcare datasets obtained from Electronic Health Records have proven to be extremely useful for assessing associations between patients' predictors and outcomes of interest. However, these datasets often suffer from missing values in a high proportion of cases, whose removal may introduce severe bias. Several multiple imputation algorithms have been proposed to attempt to recover the missing information under an assumed missingness mechanism. Each algorithm presents strengths and weaknesses, and there is currently no consensus on which multiple imputation algorithm works best in a given scenario. Furthermore, the selection of each algorithm's parameters and data-related modeling choices are also both crucial and challenging. In this paper we propose a novel framework to numerically evaluate strategies for handling missing data in the context of statistical analysis, with a particular focus on multiple imputation techniques. We demonstrate the feasibility of our approach on a large cohort of type-2 diabetes patients provided by the National COVID Cohort Collaborative (N3C) Enclave, where we explored the influence of various patient characteristics on outcomes related to COVID-19. Our analysis included classic multiple imputation techniques as well as simple complete-case Inverse Probability Weighted models. Extensive experiments show that our approach can effectively highlight the most promising and performant missing-data handling strategy for our case study. Moreover, our methodology allowed a better understanding of the behavior of the different models and of how it changed as we modified their parameters. Our method is general and can be applied to different research fields and on datasets containing heterogeneous types.


Subject(s)
COVID-19 , Humans , Algorithms , Research Design , Bias , Probability
6.
Front Public Health ; 10: 913636, 2022.
Article in English | MEDLINE | ID: covidwho-2022942

ABSTRACT

Introduction: The high co-occurrence of tobacco smoking and depression is a major public health concern during the novel coronavirus disease-2019 pandemic. However, no studies have dealt with missing values when assessing depression. Therefore, the present study aimed to examine the effect of tobacco smoking on depressive symptoms using a multiple imputation technique. Methods: This research was a longitudinal study using data from four waves of the China Health and Retirement Longitudinal Study conducted between 2011 and 2018, and the final sample consisted of 74,381 observations across all four waves of data collection. The present study employed a multiple imputation technique to deal with missing values, and a fixed effects logistic regression model was used for the analysis. Results: The results of fixed effects logistic regression showed that heavy smokers had 20% higher odds of suffering from depressive symptoms than those who never smoked. Compared to those who never smoked, for short-term and moderate-term quitters, the odds of suffering from depressive symptoms increased by 30% and 22%, respectively. The magnitudes of the odds ratios for of the variables short-term quitters, moderate-term quitters, and long-term quitters decreased in absolute terms with increasing time-gaps since quitting. The sub-group analysis for men and women found that heavy male smokers, short-term and moderate-term male quitters had higher odds of suffering from depressive symptoms than those who never smoked. However, associations between smoking status and depressive symptoms were not significant for women. Conclusions: The empirical findings suggested that among Chinese middle-aged and older adults, heavy smokers and short-term and moderate-term quitters have increased odds of suffering from depressive symptoms than those who never smoked. Moreover, former smokers reported that the probability of having depressive symptoms decreased with a longer duration since quitting. Nevertheless, the association between depressive symptoms and smoking among Chinese middle-aged and older adults is not straightforward and may vary according to gender. These results may have important implications that support the government in allocating more resources to smoking cessation programs to help middle-aged and older smokers, particularly in men.


Subject(s)
COVID-19 , Depression , Aged , China/epidemiology , Depression/epidemiology , Female , Humans , Longitudinal Studies , Male , Middle Aged , Risk Factors , Tobacco Smoking
7.
Vaccines (Basel) ; 10(8)2022 Aug 09.
Article in English | MEDLINE | ID: covidwho-1979449

ABSTRACT

Hispanic communities have been disproportionately affected by economic disparities. These inequalities have put Hispanics at an increased risk for preventable health conditions. In addition, the CDC reports Hispanics to have 1.5× COVID-19 infection rates and low vaccination rates. This study aims to identify the driving factors for COVID-19 vaccine hesitancy of Hispanic survey participants in the Rio Grande Valley. Our analysis used machine learning methods to identify significant associations between medical, economic, and social factors impacting the uptake and willingness to receive the COVID-19 vaccine. A combination of three classification methods (i.e., logistic regression, decision trees, and support vector machines) was used to classify observations based on the value of the targeted responses received and extract a robust subset of factors. Our analysis revealed different medical, economic, and social associations that correlate to other target population groups (i.e., males and females). According to the analysis performed on males, the Matthews correlation coefficient (MCC) value was 0.972. An MCC score of 0.805 was achieved by analyzing females, while the analysis of males and females achieved 0.797. Specifically, several medical, economic factors, and sociodemographic characteristics are more prevalent in vaccine-hesitant groups, such as asthma, hypertension, mental health problems, financial strain due to COVID-19, gender, lack of health insurance plans, and limited test availability.

8.
Environmetrics ; : e2751, 2022 Jul 31.
Article in English | MEDLINE | ID: covidwho-1966045

ABSTRACT

Recent ecological analyses suggest air pollution exposure may increase susceptibility to and severity of coronavirus disease 2019 (COVID-19). Individual-level studies are needed to clarify the relationship between air pollution exposure and COVID-19 outcomes. We conduct an individual-level analysis of long-term exposure to air pollution and weather on peak COVID-19 severity. We develop a Bayesian multinomial logistic regression model with a multiple imputation approach to impute partially missing health outcomes. Our approach is based on the stick-breaking representation of the multinomial distribution, which offers computational advantages, but presents challenges in interpreting regression coefficients. We propose a novel inferential approach to address these challenges. In a simulation study, we demonstrate our method's ability to impute missing outcome data and improve estimation of regression coefficients compared to a complete case analysis. In our analysis of 55,273 COVID-19 cases in Denver, Colorado, increased annual exposure to fine particulate matter in the year prior to the pandemic was associated with increased risk of severe COVID-19 outcomes. We also found COVID-19 disease severity to be associated with interactions between exposures. Our individual-level analysis fills a gap in the literature and helps to elucidate the association between long-term exposure to air pollution and COVID-19 outcomes.

9.
Cancer Epidemiol ; 79: 102198, 2022 08.
Article in English | MEDLINE | ID: covidwho-1930785

ABSTRACT

INTRODUCTION: Monitoring early diagnosis is a priority of cancer policy in England. Information on stage has not always been available for a large proportion of patients, however, which may bias temporal comparisons. We previously estimated that early-stage diagnosis of colorectal cancer rose from 32% to 44% during 2008-2013, using multiple imputation. Here we examine the underlying assumptions of multiple imputation for missing stage using the same dataset. METHODS: Individually-linked cancer registration, Hospital Episode Statistics (HES), and audit data were examined. Six imputation models including different interaction terms, post-diagnosis treatment, and survival information were assessed, and comparisons drawn with the a priori optimal model. Models were further tested by setting stage values to missing for some patients under one plausible mechanism, then comparing actual and imputed stage distributions for these patients. Finally, a pattern-mixture sensitivity analysis was conducted. RESULTS: Data from 196,511 colorectal patients were analysed, with 39.2% missing stage. Inclusion of survival time increased the accuracy of imputation: the odds ratio for change in early-stage diagnosis during 2008-2013 was 1.7 (95% CI: 1.6, 1.7) with survival to 1 year included, compared to 1.9 (95% CI 1.9-2.0) with no survival information. Imputation estimates of stage were accurate in one plausible simulation. Pattern-mixture analyses indicated our previous analysis conclusions would only change materially if stage were misclassified for 20% of the patients who had it categorised as late. INTERPRETATION: Multiple imputation models can substantially reduce bias from missing stage, but data on patient's one-year survival should be included for highest accuracy.


Subject(s)
Early Detection of Cancer , Neoplasms , Bias , Data Collection , Humans , Neoplasms/diagnosis , Neoplasms/epidemiology , Odds Ratio
10.
Mov Disord ; 37(8): 1749-1755, 2022 08.
Article in English | MEDLINE | ID: covidwho-1898912

ABSTRACT

BACKGROUND: Telemedicine has become standard in clinical care and research during the coronavirus disease 2019 pandemic. Remote administration of Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS) Part III (Motor Examination) precludes ratings of all items, because Rigidity and Postural Stability (six scores) require in-person rating. OBJECTIVE: The objective of this study was to determine imputation accuracy for total-sum and item-specific MDS-UPDRS Motor Examination scores in remote administration. METHODS: We applied multivariate imputation by chained equations techniques in a cross-sectional dataset where patients had one MDS-UPDRS rating (International Translational Program, n = 8,588) and in a longitudinal dataset where patients had multiple ratings (Rush Program, n = 396). Successful imputation was stringently defined as (1) generalized Lin's concordance correlation coefficient >0.95, reflecting near-perfect agreement between total-sum score with complete data and surrogate score, calculated without patients' actual Rigidity and Postural Stability scores; and (2) perfect agreement for item-level scores for Rigidity and Postural Stability items. RESULTS: For total-sum score when Rigidity and Postural Stability scores were withdrawn, using one or multiple visits, multivariate imputation by chained equations imputation reached near-perfect agreement with the original total-sum score. However, at the item level, the degree of perfect agreement between the surrogate and actual Rigidity items and Postural Stability scores always fell below threshold. CONCLUSIONS: The MDS-UPDRS Part III total-sum score, a key clinical outcome in research and in clinical practice, can be accurately imputed without the Rigidity and Postural Stability items that cannot be rated by telemedicine. No formula, however, allows for specific item-level imputation. When Rigidity and Postural Stability item scores are of key clinical or research interest, patients with PD must be scored in person. © 2022 International Parkinson and Movement Disorder Society.


Subject(s)
COVID-19 , Parkinson Disease , Telemedicine , Cross-Sectional Studies , Humans , Mental Status and Dementia Tests , Parkinson Disease/diagnosis , Severity of Illness Index
11.
Journal of Policy Analysis and Management ; : 27, 2022.
Article in English | Web of Science | ID: covidwho-1894620

ABSTRACT

Official poverty estimates for the United States are presented annually, based on a family unit's annual resources, and reported with a considerable lag. This study introduces a framework to produce monthly estimates of the Supplemental Poverty Measure and official poverty measure, based on a family unit's monthly income, and with a two-week lag. We argue that a shorter accounting period and more timely estimates of poverty better account for intra-year income volatility and better inform the public of current economic conditions. Our framework uses two versions of the Current Population Survey to estimate monthly poverty while accounting for changes in policy, demographic composition, and labor market characteristics. Validation tests demonstrate that our monthly poverty estimates closely align with observed trends in the Survey of Income & Program Participation from 2004 to 2016 and trends in hardship during the COVID-19 pandemic. We apply the framework to measure trends in monthly poverty from January 1994 through September 2021. Monthly poverty rates generally declined in the 1990s, increased throughout the 2000s, and declined after the Great Recession through the onset of the COVID-19 pandemic. Within-year variation in monthly poverty rates, however, has generally increased. Among families with children, within-year variation in monthly poverty rates is comparable to between-year variation, largely due to the average family with children receiving 37 percent of its annual income transfers in a single month through one-time tax credit payments. Moving forward, researchers can apply our framework to produce monthly poverty rates whenever more timely estimates are desired.

12.
Int J Stat Med Res ; 11: 1-11, 2022 Jan 28.
Article in English | MEDLINE | ID: covidwho-1699235

ABSTRACT

The COVID-19 pandemic has resulted in a disproportionate burden on racial and ethnic minority groups, but incompleteness in surveillance data limits understanding of disparities. CDC's case-based surveillance system contains case-level information on most COVID-19 cases in the United States. Data analyzed in this paper contain COVID-19 cases with case-level information through September 25, 2020, which represent 70.9% of all COVID-19 cases reported to CDC during the period. Case-level surveillance data are used to investigate COVID-19 disparities by race/ethnicity, sex, and age. However, demographic information on race and ethnicity is missing for a substantial percentage of COVID-19 cases (e.g., 35.8% and 47.2% of cases analyzed were missing race and ethnicity information, respectively). Our goal in this study was to impute missing race and ethnicity to derive more accurate incidence and incidence rate ratio (IRR) estimates for different racial and ethnic groups, and evaluate the results from imputation compared to complete case analysis, which involves removing cases with missing race/ethnicity information from the analysis. Two multiple imputation (MI) models were developed. Model 1 imputes race using six binary race variables, and Model 2 imputes race as a composite multinomial variable. Our evaluation found that compared with complete case analysis, MI reduced biases and improved coverage on incidence and IRR estimates for all race/ethnicity groups, except for the Non-Hispanic Multiple/other group. Our research highlights the importance of supplementing complete case analysis with additional methods of analysis to better describe racial and ethnic disparities. When race and ethnicity data are missing, multiple imputation may provide more accurate incidence and IRR estimates to monitor these disparities in tandem with efforts to improve the collection of race and ethnicity information for pandemic surveillance.

13.
Popul Health Metr ; 19(1): 44, 2021 11 04.
Article in English | MEDLINE | ID: covidwho-1503922

ABSTRACT

BACKGROUND: Poor data quality is limiting the use of data sourced from routine health information systems (RHIS), especially in low- and middle-income countries. An important component of this data quality issue comes from missing values, where health facilities, for a variety of reasons, fail to report to the central system. METHODS: Using data from the health management information system in the Democratic Republic of the Congo and the advent of COVID-19 pandemic as an illustrative case study, we implemented seven commonly used imputation methods and evaluated their performance in terms of minimizing bias in imputed values and parameter estimates generated through subsequent analytical techniques, namely segmented regression, which is widely used in interrupted time series studies, and pre-post-comparisons through paired Wilcoxon rank-sum tests. We also examined the performance of these imputation methods under different missing mechanisms and tested their stability to changes in the data. RESULTS: For regression analyses, there were no substantial differences found in the coefficient estimates generated from all methods except mean imputation and exclusion and interpolation when the data contained less than 20% missing values. However, as the missing proportion grew, k-NN started to produce biased estimates. Machine learning algorithms, i.e. missForest and k-NN, were also found to lack robustness to small changes in the data or consecutive missingness. On the other hand, multiple imputation methods generated the overall most unbiased estimates and were the most robust to all changes in data. They also produced smaller standard errors than single imputations. For pre-post-comparisons, all methods produced p values less than 0.01, regardless of the amount of missingness introduced, suggesting low sensitivity of Wilcoxon rank-sum tests to the imputation method used. CONCLUSIONS: We recommend the use of multiple imputation in addressing missing values in RHIS datasets and appropriate handling of data structure to minimize imputation standard errors. In cases where necessary computing resources are unavailable for multiple imputation, one may consider seasonal decomposition as the next best method. Mean imputation and exclusion and interpolation, however, always produced biased and misleading results in the subsequent analyses, and thus, their use in the handling of missing values should be discouraged.


Subject(s)
COVID-19 , Health Information Systems , Democratic Republic of the Congo/epidemiology , Humans , Pandemics , SARS-CoV-2
14.
Contemp Clin Trials ; 108: 106494, 2021 09.
Article in English | MEDLINE | ID: covidwho-1283967

ABSTRACT

For many years there has been a consensus among the Clinical Research community that ITT analysis represents the correct approach for the vast majority of trials. Recent worldwide regulatory guidance for pharmaceutical industry trials has allowed discussion of alternatives to the ITT approach to analysis; different treatment effects can be considered which may be more clinically meaningful and more relevant to patients and prescribers. The key concept is of a trial "estimand", a precise description of the estimated treatment effect. The strategy chosen to account for patients who discontinue treatment or take alternative medications which are not part of the randomised treatment regimen are important determinants of this treatment effect. One strategy to account for these events is treatment policy, which corresponds to an ITT approach. Alternative equally valid strategies address what the treatment effect is if the patient actually takes the treatment or does not use specific alternative medication. There is no single right answer to which strategy is most appropriate, the solution depends on the key clinical question of interest. The estimands framework discussed in the new guidance has been particularly useful in the context of the current COVID-19 pandemic and has clarified what choices are available to account for the impact of COVID-19 on clinical trials. Specifically, an ITT approach addresses a treatment effect that may not be generalisable beyond the current pandemic.


Subject(s)
COVID-19 , Pandemics , Humans , SARS-CoV-2
15.
BMC Med Res Methodol ; 20(1): 208, 2020 08 12.
Article in English | MEDLINE | ID: covidwho-713161

ABSTRACT

BACKGROUND: The coronavirus pandemic (Covid-19) presents a variety of challenges for ongoing clinical trials, including an inevitably higher rate of missing outcome data, with new and non-standard reasons for missingness. International drug trial guidelines recommend trialists review plans for handling missing data in the conduct and statistical analysis, but clear recommendations are lacking. METHODS: We present a four-step strategy for handling missing outcome data in the analysis of randomised trials that are ongoing during a pandemic. We consider handling missing data arising due to (i) participant infection, (ii) treatment disruptions and (iii) loss to follow-up. We consider both settings where treatment effects for a 'pandemic-free world' and 'world including a pandemic' are of interest. RESULTS: In any trial, investigators should; (1) Clarify the treatment estimand of interest with respect to the occurrence of the pandemic; (2) Establish what data are missing for the chosen estimand; (3) Perform primary analysis under the most plausible missing data assumptions followed by; (4) Sensitivity analysis under alternative plausible assumptions. To obtain an estimate of the treatment effect in a 'pandemic-free world', participant data that are clinically affected by the pandemic (directly due to infection or indirectly via treatment disruptions) are not relevant and can be set to missing. For primary analysis, a missing-at-random assumption that conditions on all observed data that are expected to be associated with both the outcome and missingness may be most plausible. For the treatment effect in the 'world including a pandemic', all participant data is relevant and should be included in the analysis. For primary analysis, a missing-at-random assumption - potentially incorporating a pandemic time-period indicator and participant infection status - or a missing-not-at-random assumption with a poorer response may be most relevant, depending on the setting. In all scenarios, sensitivity analysis under credible missing-not-at-random assumptions should be used to evaluate the robustness of results. We highlight controlled multiple imputation as an accessible tool for conducting sensitivity analyses. CONCLUSIONS: Missing data problems will be exacerbated for trials active during the Covid-19 pandemic. This four-step strategy will facilitate clear thinking about the appropriate analysis for relevant questions of interest.


Subject(s)
Outcome Assessment, Health Care/statistics & numerical data , Practice Guidelines as Topic , Randomized Controlled Trials as Topic/statistics & numerical data , Research Design/statistics & numerical data , Betacoronavirus/physiology , COVID-19 , Comorbidity , Coronavirus Infections/epidemiology , Coronavirus Infections/therapy , Coronavirus Infections/virology , Humans , Outcome Assessment, Health Care/methods , Pandemics , Pneumonia, Viral/epidemiology , Pneumonia, Viral/therapy , Pneumonia, Viral/virology , Randomized Controlled Trials as Topic/methods , Reproducibility of Results , SARS-CoV-2
16.
Stat Biopharm Res ; 12(4): 443-450, 2020 Aug 05.
Article in English | MEDLINE | ID: covidwho-707563

ABSTRACT

Abstract-The COVID-19 pandemic has impacted ongoing clinical trials. We consider particular impacts on noninferiority clinical trials, which aim to show that an investigational treatment is not markedly worse than an existing active control with known benefit. Because interpretation of noninferiority trials requires cross-trial validation involving untestable assumptions, it is vital that they be run to very high standards. The COVID-19 pandemic has introduced an unexpected impact on clinical trials, with subjects possibly missing treatment or assessments due to unforeseen intercurrent events. The resulting data must be carefully considered to ensure proper statistical inference. Missing data can often, but not always, be considered missing completely at random (MCAR). We discuss ways to ensure validity of the analyses through study conduct and data analysis, with focus on the hypothetical strategy for constructing estimands. We assess various analytic strategies of analyzing longitudinal binary data with dropouts where outcomes may be MCAR or missing at random (MAR). Simulations show that certain multiple imputation strategies control the Type I error rate and provide additional power over analysis of observed data when data are MCAR or MAR, with weaker assumptions about the missing data mechanism.

17.
J Big Data ; 7(1): 37, 2020.
Article in English | MEDLINE | ID: covidwho-597057

ABSTRACT

In data analytics, missing data is a factor that degrades performance. Incorrect imputation of missing values could lead to a wrong prediction. In this era of big data, when a massive volume of data is generated in every second, and utilization of these data is a major concern to the stakeholders, efficiently handling missing values becomes more important. In this paper, we have proposed a new technique for missing data imputation, which is a hybrid approach of single and multiple imputation techniques. We have proposed an extension of popular Multivariate Imputation by Chained Equation (MICE) algorithm in two variations to impute categorical and numeric data. We have also implemented twelve existing algorithms to impute binary, ordinal, and numeric missing values. We have collected sixty-five thousand real health records from different hospitals and diagnostic centers of Bangladesh, maintaining the privacy of data. We have also collected three public datasets from the UCI Machine Learning Repository, ETH Zurich, and Kaggle. We have compared the performance of our proposed algorithms with existing algorithms using these datasets. Experimental results show that our proposed algorithm achieves 20% higher F-measure for binary data imputation and 11% less error for numeric data imputations than its competitors with similar execution time.

SELECTION OF CITATIONS
SEARCH DETAIL